-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loop Refactor 1/N - Training Loop #7871
Conversation
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
…train' into refactor/loops/loops_everywhere_train
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a kickass PR!
for more information, see https://pre-commit.ci
…train' into refactor/loops/loops_everywhere_train
Hey @awaelchli, It is not a blocker for this PR, but there is a small inconsistency in the loop right now. Best, |
great observation @tchaton, moved to on_run_start! |
pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py
Show resolved
Hide resolved
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
if epoch_output is None: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awaelchli do you remember why this was necessary?
edit: checking here #9298
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a good reason, something along those lines of my other comment. pretty sure without this a test will fail. I would not have added it otherwise. But I should have added a comment. De-cyphering handling of these edge cases was definitely a center point of frustration for these loop refactors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty sure this: https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html?highlight=skip%20epoch#on-train-batch-start
returning -1 here skipping the epoch
if self.batches_seen == 0: | ||
# dataloader/iterator did not produce a batch | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awaelchli do you remember why this was necessary?
edit: checking here #9298
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, basically if the epoch is empty we don't want to run epoch_end hooks or invoke any logging. when does this happen? a user can stop training or skip an epoch by returning a signal from one of the hooks. I don't remember the exact use case. Pretty sure if you remove this line a test will fail and inform you exactly the reason.
What does this PR do?
Part of #7700
Introduces the new loop interface.
Old classes will be removed asap in a follow up.
co-author: @justusschock
How to review this PR
pytorch_lightning/loops/base.py
The new training loop classes are re-organizing the old TrainLoop class.
Important: Changes requested on code lines that are 1-1 from the old TrainLoop class will be dismissed. We cannot address these comments here. More refactor iterations will be required to simplify all existing helper functions.
TODO:
More follow-up work is listed in this issue tracker: #7938
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃
Fixes #7034